New Features in SAS/INSIGHT® in Version 7

نویسندگان

  • Marc-david Cohen
  • Hong Chen
  • Yang Yuan
  • Frederick Wicklin
چکیده

There are significant new features in Version 7 SAS/INSIGHT. These include new statistical analyses and enhancements to the graphics. INSIGHT now supports several multivariate statistical techniques including principal component rotation analysis, canonical correlation analysis, maximum redundancy analysis, and canonical discriminant analysis, and it also supports comparison of means. There are also new robust measures of scale and tests for normality for univariate data as well as tests for differences of means across groups. New graphical enhancements include 3D surface plots, contour plots, 3D response surfaces, comparison of means circles in the box plots, and color blending of up to 5 colors. Several methods for surface fitting are provided including linear interpolation, thin-plate spline, kernel estimation, and using a parametric model. INTRODUCTION Version 7 contains significant enhancements to SAS/INSIGHT in its statistical functionality and graphical features. This paper highlights the major new features and shows examples of many of them. In the first sections, the new statistical functions are discussed. In the later sections, the new graphical features, including response surfaces, are highlighted. All the graphs are shown in monotone, however, so please keep in mind that the software supports colors, and many of the features require color to be effective. DISTRIBUTION ANALYSIS There are several new univariate statistics in Version 7. These cover ì basic confidence intervals of mean, standard deviation, and variance ì robust measures of scale ì tests for normality Under the ANALYZE:DISTRIBUTION(Y) pull-down menu you can specify the statistics of interest via the Output dialog. Figure 1 shows the tables produced when you request robust scale statistics and tests for normality. MULTIVARIATE ANALYSES When you choose ANALYZE:MULTIVARIATE (YX) from the pull-down menus, you gain access to a variety of multivariate analyses. These methods provide a way of examining relationships among a set of variables and between two sets of variables. In earlier versions of SAS/INSIGHT, limited multivariate analysis is supported. In Version 7 you can use principal component analysis to examine relationships among several variables, principal component rotation to obtain factors that are more easily interpretable, canonical correlation analysis and maximum redundancy analysis for relationships between two sets of interval variables, and canonical discriminant analysis for relationships between a nominal variable and a set of interval variables. The following table shows the requirements. METHOD X’s Y’s Canonical Correlation Multiple Multiple Maximum Redundancy Multiple Multiple Canonical Discriminant Analysis Multiple Single Nominal Principle Component Analysis None Multiple Figure 1. New Univariate Statistics To demonstrate one of these methods, consider the data from the 1995 U.S. News & Report on American colleges and universities. They include demographic information on tuition, room and board costs, SAT or ACT scores, application/acceptance rates, student/faculty ratio, and graduation rate. If you select ANALYZE:MULTIVARIATE (YX) from the Analyze pull-down menu and press the Output button on the Multivariate dialog, you open the dialog shown in Figure 2. This shows the different multivariate analyses supported by the software. Select the type of analysis you want to perform in the appropriate check box then press the associated button to specify options for that analysis. Suppose that you want to see how well the out-of-state tuition costs and the student-faculty ratio explain the variation in the quality of the student body as measured by the entering SAT scores and the percentage of the entering class that is in the top 20 percent of their graduating class. You select the appropriate variables and type of analysis. The output first lists the two sets of variables upon which the analysis is being performed and then includes scatter plots with 80% confidence ellipses for each of the variables in each of the two groups. This is similar to the output in Version 6 of SAS/INSIGHT and is not shown here. Figure 2. Multivariate (YX) Dialog Figure 3. Univariate Statistics Next are univariate statistics, the correlation matrix, and the p-values associated with the correlations. Notice that, according to the p-values, all the correlations are significant (see Figure 3). Next, as shown in Figure 4, the canonical correlations are printed along with eigenvalues from a matrix of cross products, the results of a test which show that the canonical correlations are zero, and the correlations of the original variables with the canonical variables. Much more detail can be included in the output, including the canonical coefficients. Notice that the second canonical variable fails the test of being different from zero. This means that much of the variation in the two sets of data can be captured in the first canonical variable. Figure 5 shows a scatter plot of the first canonical variable. This is a plot of the inner products of Figure 4. Canonical Correlation Output Figure 5. Scatter Plot of Canonical Variable the two sets of standardized scoring coefficients associated with the first canonical variable and the standardized data. Biplots One key new feature in the multivariate analysis is the ability to create 2d and 3d biplots. A biplot is a way of plotting the variables and the observations together in a single plot; observations are points and variables are vectors. In a biplot the projection of an observation onto a variable shows the contribution of that variable to that point and the length of the axis is approximately proportional to the standard deviations of the variable. If the length of the projection is long then that variable had a large contribution to that observation. See Gower and Hand (1996) for a through discussion. Consider data from the United States Department of Commerce on the lowest temperatures (in F) recorded in various months for cities in the US. Can the cities be characterized by these data? It is interesting to fit a two component model to these data and plot the principal components. Select ANALYZE:MULTIVARIATE (YX) and choose the 4 months as the Y variables. These are the variables containing the lowest temperature for each of four months. The Output button opens the dialog shown in Figure 6 and the Principal Component button opens the dialog shown in Figure 2. Selecting the Biplot radio button produces the desired plot. SAS/INSIGHT shows the length of the vector as a solid line and extends the vector with a dotted line to clarify the vector’s direction. Figure 7 shows a biplot in the principal component space. The output not shown here confirms that the Y variables are well approximated by the biplot. Notice how the plot of the cities in the principal component space shows the division of cities graphically. The West Coast cities are grouped clearly. The biplot shows that Honolulu differentiates itself from the other Pacific Rim cities by its large distance from the July axis. Figure 6. Principal Component Analysis Output Dialog

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compatibility of SAS Data Library

This paper presents an overview of backwards and forwards compatibility between Versions 6 and 7 of the SAS System and is intended to be used as a reference in determining what manner of access is allowed. Furthermore, this paper takes into account differences in compatibility between the Base SAS System and when products such as SAS/CONNECT and SAS/SHARE are used. SAS I/O Services The I/O Se...

متن کامل

Taming the Chaos: Managing Large SAS/AF Applications Using Programming Standards and the Source Control Manager of Version 7 of the SAS System

The use of programming teams offers both advantages and disadvantages when compared with individual programming efforts. The team approach allows a wide range of programming skills and problem-solving perspectives to be applied to a project, and may shorten development time. On the other hand, teamdeveloped projects are often marred by differences in programming styles among developers, resulti...

متن کامل

Motorola’s Engineering Data Analysis System: 10 Years of Analytical Excellence

In an effort to improve semiconductor product yields, an engineering data analysis application was developed in 1989 for engineers at Motorola’s Advanced Products Research and Development Laboratory (APRDL). This application was called EDAS – the Engineering Data Analysis System. What began as a small text-based tool providing a handful of statistical reports used in one laboratory has grown to...

متن کامل

Nickel Base Superalloy Rene®80 – The Effect of High Temperature Cyclic Oxidation on Platinum-Aluminide Coating Features

Nickel base superalloy alloys are used in the manufacture of gas turbine engine components, which in use are exposed to high temperatures and corrosive environments. The platinum aluminide coatings described here have been developed to protect nickel base superalloy alloys from oxidation. In this study, the effect of cyclic oxidation, platinum layer thickness and aluminizing process on beha...

متن کامل

The Power of Hybrid OLAP in a Multidimensional World

Version 8 of the SAS® System brings powerful new features for managing a Hybrid OLAP (HOLAP) or Distributed Multidimensional Data environment. The HOLAP component of the SAS/MDDB® Server software enables you to include SAS Multidimensional databases (MDDB), SAS files, and relational (RDBMS) databases into a single, powerful OLAP reporting environment. Support for HOLAP data groups is fully inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998